Overview

Dataset statistics

Number of variables11
Number of observations500000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory70.7 MiB
Average record size in memory148.2 B

Variable types

NUM10
CAT1

Reproduction

Analysis started2020-07-02 22:11:18.813825
Analysis finished2020-07-03 00:18:34.924549
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
tour_id is highly correlated with los_order_idHigh Correlation
los_order_id is highly correlated with tour_idHigh Correlation
pick_diff_seconds is highly skewed (γ1 = 401.8300644) Skewed

Variables

df_index
Real number (ℝ≥0)

UNIQUE
Distinct count500000
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5408111.334702
Minimum15
Maximum10801357
Zeros0
Zeros (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum15
5-th percentile542087.7
Q12703968.75
median5412437.5
Q38119263.25
95-th percentile10262928.45
Maximum10801357
Range10801342
Interquartile range (IQR)5415294.5

Descriptive statistics

Standard deviation3120164.292
Coefficient of variation (CV)0.5769415788
Kurtosis-1.201953544
Mean5408111.335
Median Absolute Deviation (MAD)2702774.11
Skewness-0.002616102165
Sum2.704055667e+12
Variance9.735425207e+12
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.5000000e+01 1.0801357e+07], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2101246 1 < 0.1%
 
10165711 1 < 0.1%
 
6016469 1 < 0.1%
 
6809457 1 < 0.1%
 
1813977 1 < 0.1%
 
2866651 1 < 0.1%
 
759262 1 < 0.1%
 
9149919 1 < 0.1%
 
2784739 1 < 0.1%
 
679399 1 < 0.1%
 
Other values (499990) 499990 > 99.9%
 
ValueCountFrequency (%) 
15 1 < 0.1%
 
53 1 < 0.1%
 
57 1 < 0.1%
 
65 1 < 0.1%
 
86 1 < 0.1%
 
ValueCountFrequency (%) 
10801357 1 < 0.1%
 
10801352 1 < 0.1%
 
10801340 1 < 0.1%
 
10801319 1 < 0.1%
 
10801308 1 < 0.1%
 

los_order_id
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count337695
Unique (%)67.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2173319.702244
Minimum1677849.0
Maximum2675964.0
Zeros0
Zeros (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum1677849
5-th percentile1726295.95
Q11919151.25
median2174486
Q32425038.25
95-th percentile2624770.05
Maximum2675964
Range998115
Interquartile range (IQR)505887

Descriptive statistics

Standard deviation289541.3517
Coefficient of variation (CV)0.1332253839
Kurtosis-1.215069275
Mean2173319.702
Median Absolute Deviation (MAD)251091.5691
Skewness0.007832943834
Sum1.086659851e+12
Variance8.383419435e+10
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1677849. 1679656. 1680688.5 1680694.5 1681488.5 ... 2674356. 2675179. 2675637.5 2675716.5 2675964. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2307491 14 < 0.1%
 
2300937 13 < 0.1%
 
2110108 13 < 0.1%
 
1690476 13 < 0.1%
 
2364725 12 < 0.1%
 
1911940 12 < 0.1%
 
2369536 11 < 0.1%
 
1914141 11 < 0.1%
 
2326480 11 < 0.1%
 
2609546 10 < 0.1%
 
Other values (337685) 499880 > 99.9%
 
ValueCountFrequency (%) 
1677849 2 < 0.1%
 
1677862 1 < 0.1%
 
1677868 1 < 0.1%
 
1677901 1 < 0.1%
 
1677907 1 < 0.1%
 
ValueCountFrequency (%) 
2675964 1 < 0.1%
 
2675942 1 < 0.1%
 
2675926 2 < 0.1%
 
2675910 1 < 0.1%
 
2675908 1 < 0.1%
 

art_id
Real number (ℝ≥0)

Distinct count18914
Unique (%)3.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean490912.195586
Minimum1.0
Maximum999987.0
Zeros0
Zeros (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum1
5-th percentile41162
Q1343299
median498915
Q3630876
95-th percentile863077.05
Maximum999987
Range999986
Interquartile range (IQR)287577

Descriptive statistics

Standard deviation238944.6301
Coefficient of variation (CV)0.4867359831
Kurtosis-0.4605155953
Mean490912.1956
Median Absolute Deviation (MAD)188292.9104
Skewness-0.1503300642
Sum2.454560978e+11
Variance5.709453626e+10
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.000000e+00 9.100000e+01 1.080000e+02 1.110000e+02 1.445000e+02 ... 9.996325e+05 9.999825e+05 9.999855e+05 9.999865e+05 9.999870e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
599399 2095 0.4%
 
594430 1613 0.3%
 
39152 1547 0.3%
 
304713 1523 0.3%
 
361703 1486 0.3%
 
595420 1441 0.3%
 
46443 1206 0.2%
 
651233 1171 0.2%
 
28171 1130 0.2%
 
45953 1077 0.2%
 
Other values (18904) 485711 97.1%
 
ValueCountFrequency (%) 
1 4 < 0.1%
 
76 14 < 0.1%
 
106 112 < 0.1%
 
110 77 < 0.1%
 
112 160 < 0.1%
 
ValueCountFrequency (%) 
999987 107 < 0.1%
 
999986 89 < 0.1%
 
999985 19 < 0.1%
 
999980 1 < 0.1%
 
999977 1 < 0.1%
 

to_pick
Real number (ℝ≥0)

Distinct count120
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.0507
Minimum1.0
Maximum270.0
Zeros0
Zeros (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile5
Maximum270
Range269
Interquartile range (IQR)1

Descriptive statistics

Standard deviation4.332067294
Coefficient of variation (CV)2.112482223
Kurtosis402.4605722
Mean2.0507
Median Absolute Deviation (MAD)1.490519612
Skewness15.3045627
Sum1025350
Variance18.76680704
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 1.5 2.5 3.5 4.5 ... 119. 125. 147. 150.5 270. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 350883 70.2%
 
2 78050 15.6%
 
3 24015 4.8%
 
4 14176 2.8%
 
5 8042 1.6%
 
6 7103 1.4%
 
10 3683 0.7%
 
12 2910 0.6%
 
8 2303 0.5%
 
24 855 0.2%
 
Other values (110) 7980 1.6%
 
ValueCountFrequency (%) 
1 350883 70.2%
 
2 78050 15.6%
 
3 24015 4.8%
 
4 14176 2.8%
 
5 8042 1.6%
 
ValueCountFrequency (%) 
270 1 < 0.1%
 
268 1 < 0.1%
 
242 1 < 0.1%
 
201 1 < 0.1%
 
200 1 < 0.1%
 

weight
Real number (ℝ≥0)

Distinct count1349
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.49411269000000013
Minimum0.0
Maximum20.0
Zeros5
Zeros (%)< 0.1%
Memory size3.8 MiB

Quantile statistics

Minimum0
5-th percentile0.022
Q10.091
median0.238
Q30.504
95-th percentile1.648
Maximum20
Range20
Interquartile range (IQR)0.413

Descriptive statistics

Standard deviation0.8914900184
Coefficient of variation (CV)1.80422409
Kurtosis31.45272506
Mean0.49411269
Median Absolute Deviation (MAD)0.4731781183
Skewness4.761819236
Sum247056.345
Variance0.7947544528
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000e+00 5.00000e-04 2.50000e-03 3.50000e-03 5.50000e-03 ... 1.47900e+01 1.50890e+01 1.75785e+01 1.99995e+01 2.00000e+01], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.298 7086 1.4%
 
0.321 5746 1.1%
 
0.107 5095 1.0%
 
0.34 5030 1.0%
 
0.094 4140 0.8%
 
0.1 4013 0.8%
 
0.026 3862 0.8%
 
0.08 3453 0.7%
 
0.28 3292 0.7%
 
0.02 3019 0.6%
 
Other values (1339) 455264 91.1%
 
ValueCountFrequency (%) 
0 5 < 0.1%
 
0.001 48 < 0.1%
 
0.002 41 < 0.1%
 
0.003 173 < 0.1%
 
0.004 904 0.2%
 
ValueCountFrequency (%) 
20 6 < 0.1%
 
19.999 1 < 0.1%
 
15.158 25 < 0.1%
 
15.02 15 < 0.1%
 
14.56 1 < 0.1%
 

volume
Real number (ℝ≥0)

Distinct count6868
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2104.77598724
Minimum0.2
Maximum144500.0
Zeros0
Zeros (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum0.2
5-th percentile65.34
Q1330
median563.58
Q31416.14
95-th percentile9672
Maximum144500
Range144499.8
Interquartile range (IQR)1086.14

Descriptive statistics

Standard deviation5546.980824
Coefficient of variation (CV)2.635425745
Kurtosis45.29662739
Mean2104.775987
Median Absolute Deviation (MAD)2480.923836
Skewness5.708902834
Sum1052387994
Variance30768996.26
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[2.0000000e-01 3.8500000e-01 4.4500000e-01 5.1500000e-01 6.3500000e-01 ... 5.4007000e+04 6.6831000e+04 8.1970685e+04 1.2611000e+05 1.4450000e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
353.65 8144 1.6%
 
360 6784 1.4%
 
498.4 4175 0.8%
 
384.58 3320 0.7%
 
392.09 2962 0.6%
 
438.22 2788 0.6%
 
311.36 2673 0.5%
 
1720.4 2278 0.5%
 
468.72 1901 0.4%
 
840.84 1894 0.4%
 
Other values (6858) 463081 92.6%
 
ValueCountFrequency (%) 
0.2 14 < 0.1%
 
0.37 5 < 0.1%
 
0.38 3 < 0.1%
 
0.39 25 < 0.1%
 
0.44 30 < 0.1%
 
ValueCountFrequency (%) 
144500 1 < 0.1%
 
144000 2 < 0.1%
 
129500 1 < 0.1%
 
122720 13 < 0.1%
 
113100 1 < 0.1%
 

tour_id
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count117823
Unique (%)23.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean279091.286868
Minimum207302.0
Maximum355562.0
Zeros0
Zeros (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum207302
5-th percentile214109
Q1241799.5
median278300
Q3315534
95-th percentile347115
Maximum355562
Range148260
Interquartile range (IQR)73734.5

Descriptive statistics

Standard deviation42692.69294
Coefficient of variation (CV)0.152970354
Kurtosis-1.194966382
Mean279091.2869
Median Absolute Deviation (MAD)36974.18322
Skewness0.05875461569
Sum1.395456434e+11
Variance1822666031
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[207302. 207357.5 207391.5 207412.5 207431.5 ... 355094. 355097.5 355337.5 355502.5 355562. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
213648 38 < 0.1%
 
312910 30 < 0.1%
 
296215 28 < 0.1%
 
227662 27 < 0.1%
 
328010 27 < 0.1%
 
253985 27 < 0.1%
 
210787 27 < 0.1%
 
319372 27 < 0.1%
 
352164 26 < 0.1%
 
310187 26 < 0.1%
 
Other values (117813) 499717 99.9%
 
ValueCountFrequency (%) 
207302 5 < 0.1%
 
207305 4 < 0.1%
 
207316 1 < 0.1%
 
207321 2 < 0.1%
 
207329 3 < 0.1%
 
ValueCountFrequency (%) 
355562 2 < 0.1%
 
355521 2 < 0.1%
 
355511 2 < 0.1%
 
355503 1 < 0.1%
 
355502 1 < 0.1%
 

box_position_on_cart
Real number (ℝ≥0)

Distinct count42
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.401244
Minimum1.0
Maximum42.0
Zeros0
Zeros (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q38
95-th percentile18
Maximum42
Range41
Interquartile range (IQR)5

Descriptive statistics

Standard deviation6.194888539
Coefficient of variation (CV)0.9677632252
Kurtosis10.54646666
Mean6.401244
Median Absolute Deviation (MAD)3.994000729
Skewness2.863554502
Sum3200622
Variance38.37664401
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 1.5 3.5 4.5 5.5 ... 25.5 33.5 40.5 41.5 42. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
4 72709 14.5%
 
3 60448 12.1%
 
2 59980 12.0%
 
1 53222 10.6%
 
8 46433 9.3%
 
7 39243 7.8%
 
6 38699 7.7%
 
5 36632 7.3%
 
12 13995 2.8%
 
11 12324 2.5%
 
Other values (32) 66315 13.3%
 
ValueCountFrequency (%) 
1 53222 10.6%
 
2 59980 12.0%
 
3 60448 12.1%
 
4 72709 14.5%
 
5 36632 7.3%
 
ValueCountFrequency (%) 
42 993 0.2%
 
41 901 0.2%
 
40 787 0.2%
 
39 795 0.2%
 
38 749 0.1%
 

box_type
Categorical

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.8 MiB
SCHWARZ
210295
WEISS
116755
GRÜN
96967
ROT
45850
BLAU
 
24914
Other values (2)
 
5219
ValueCountFrequency (%) 
SCHWARZ 210295 42.1%
 
WEISS 116755 23.4%
 
GRÜN 96967 19.4%
 
ROT 45850 9.2%
 
BLAU 24914 5.0%
 
GELB 3287 0.7%
 
onePos_LX 1932 0.4%
 

Length

Max length9
Mean length5.4229
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 19 79.2%
 
Lowercase_Letter 4 16.7%
 
Connector_Punctuation 1 4.2%
 
ValueCountFrequency (%) 
Latin 23 95.8%
 
Common 1 4.2%
 
ValueCountFrequency (%) 
ASCII 23 100.0%
 

pick_diff_microseconds
Real number (ℝ≥0)

Distinct count393280
Unique (%)78.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean499708.399822
Minimum1
Maximum999999
Zeros0
Zeros (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum1
5-th percentile50155.85
Q1248883.75
median499423.5
Q3750289.5
95-th percentile950241.2
Maximum999999
Range999998
Interquartile range (IQR)501405.75

Descriptive statistics

Standard deviation288958.9284
Coefficient of variation (CV)0.5782550954
Kurtosis-1.204729598
Mean499708.3998
Median Absolute Deviation (MAD)250449.8706
Skewness0.001853901889
Sum2.498541999e+11
Variance8.349726231e+10
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.00000e+00 9.99999e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
34593 7 < 0.1%
 
164315 6 < 0.1%
 
403997 6 < 0.1%
 
682555 6 < 0.1%
 
241388 6 < 0.1%
 
539310 6 < 0.1%
 
600301 6 < 0.1%
 
879268 6 < 0.1%
 
983162 6 < 0.1%
 
989026 6 < 0.1%
 
Other values (393270) 499939 > 99.9%
 
ValueCountFrequency (%) 
1 1 < 0.1%
 
3 1 < 0.1%
 
4 1 < 0.1%
 
5 1 < 0.1%
 
7 1 < 0.1%
 
ValueCountFrequency (%) 
999999 1 < 0.1%
 
999998 1 < 0.1%
 
999995 1 < 0.1%
 
999993 1 < 0.1%
 
999991 1 < 0.1%
 

pick_diff_seconds
Real number (ℝ≥0)

SKEWED
Distinct count489999
Unique (%)98.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.644508399822
Minimum0.134724
Maximum33881.391251
Zeros0
Zeros (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum0.134724
5-th percentile1.4420235
Q12.74092575
median4.8897925
Q310.19369125
95-th percentile32.27921385
Maximum33881.39125
Range33881.25653
Interquartile range (IQR)7.4527655

Descriptive statistics

Standard deviation68.10910549
Coefficient of variation (CV)7.061957195
Kurtosis178475.6159
Mean9.6445084
Median Absolute Deviation (MAD)8.133442207
Skewness401.8300644
Sum4822254.2
Variance4638.85025
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.34724000e-01 3.07114500e-01 5.22953500e-01 6.13565000e-01 6.71006000e-01 ... 2.56007883e+02 3.42307470e+02 5.94910133e+02 2.22628344e+03 3.38813913e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2.173411 4 < 0.1%
 
1.629012 4 < 0.1%
 
2.343898 3 < 0.1%
 
8.276109 3 < 0.1%
 
2.519241 3 < 0.1%
 
1.628845 3 < 0.1%
 
2.648494 3 < 0.1%
 
5.328834 3 < 0.1%
 
2.876823 3 < 0.1%
 
4.622421 3 < 0.1%
 
Other values (489989) 499968 > 99.9%
 
ValueCountFrequency (%) 
0.134724 1 < 0.1%
 
0.146618 1 < 0.1%
 
0.149407 1 < 0.1%
 
0.151895 1 < 0.1%
 
0.154498 1 < 0.1%
 
ValueCountFrequency (%) 
33881.39125 1 < 0.1%
 
26646.31887 1 < 0.1%
 
17814.2364 1 < 0.1%
 
2243.197013 1 < 0.1%
 
2209.369868 1 < 0.1%
 

Interactions